Quantifying copy number variations using a hidden Markov model with inhomogeneous emission distributions.
نویسندگان
چکیده
Copy number variations (CNVs) are a significant source of genetic variation and have been found frequently associated with diseases such as cancers and autism. High-throughput sequencing data are increasingly being used to detect and quantify CNVs; however, the distributional properties of the data are not fully understood. A hidden Markov model (HMM) is proposed using inhomogeneous emission distributions based on negative binomial regression to account for the sequencing biases. The model is tested on the whole genome sequencing data and simulated data sets. An algorithm for CNV detection is implemented in the R package CNVfinder. The model based on negative binomial regression is shown to provide a good fit to the data and provides competitive performance compared with methods based on normalization of read counts.
منابع مشابه
Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays
Somatic alterations in DNA copy number have been well studied in numerous malignancies, yet the role of germline DNA copy number variation in cancer is still emerging. Genotyping microarrays generate allele-specific signal intensities to determine genotype, but may also be used to infer DNA copy number using additional computational approaches. Numerous tools have been developed to analyze Illu...
متن کاملHardy-Weinberg equilibrium revisited for inferences on genotypes featuring allele and copy-number variations
Copy number variations represent a substantial source of genetic variation and are associated with a plethora of physiological and pathophysiological conditions. Joint copy number and allelic variations (CNAVs) are difficult to analyze and require new strategies to unravel the properties of genotype distributions. We developed a Bayesian hidden Markov model (HMM) approach that allows dissecting...
متن کاملBayesian non-parametric hidden Markov models with applications in genomics
We propose a flexible non-parametric specification of the emission distribution in hidden Markov models and we introduce a novel methodology for carrying out the computations. Whereas current approaches use a finite mixture model, we argue in favour of an infinite mixture model given by a mixture of Dirichlet processes.The computational framework is based on auxiliary variable representations o...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملاستفاده از مدل مارکوف پنهان در پیشبینی موارد جدید سل در استان همدان بر اساس اطلاعات موارد ثبت شده طی سالهای 94-1384
Background and Objectives: Tuberculosis is a chronic bacterial disease and a major cause of morbidity and mortality. It is caused by a Mycobacterium tuberculosis. Awareness of the incidence and number of new cases of the disease is valuable information for revising the implemented programs and development indicators. time series and regression are commonly used models for prediction but these m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Biostatistics
دوره 14 3 شماره
صفحات -
تاریخ انتشار 2013